Goto

Collaborating Authors

 multiple camera


Disturbance-Free Surgical Video Generation from Multi-Camera Shadowless Lamps for Open Surgery

Kato, Yuna, Mori, Shohei, Saito, Hideo, Takatsume, Yoshifumi, Kajita, Hiroki, Isogawa, Mariko

arXiv.org Artificial Intelligence

Video recordings of open surgeries are greatly required for education and research purposes. However, capturing unobstructed videos is challenging since surgeons frequently block the camera field of view. To avoid occlusion, the positions and angles of the camera must be frequently adjusted, which is highly labor-intensive. Prior work has addressed this issue by installing multiple cameras on a shadowless lamp and arranging them to fully surround the surgical area. This setup increases the chances of some cameras capturing an unobstructed view. However, manual image alignment is needed in post-processing since camera configurations change every time surgeons move the lamp for optimal lighting. This paper aims to fully automate this alignment task. The proposed method identifies frames in which the lighting system moves, realigns them, and selects the camera with the least occlusion to generate a video that consistently presents the surgical field from a fixed perspective. A user study involving surgeons demonstrated that videos generated by our method were superior to those produced by conventional methods in terms of the ease of confirming the surgical area and the comfort during video viewing. Additionally, our approach showed improvements in video quality over existing techniques. Furthermore, we implemented several synthesis options for the proposed view-synthesis method and conducted a user study to assess surgeons' preferences for each option.


Training for X-Ray Vision: Amodal Segmentation, Amodal Content Completion, and View-Invariant Object Representation from Multi-Camera Video

Moore, Alexander, Saini, Amar, Cancilla, Kylie, Poland, Doug, Carrano, Carmen

arXiv.org Artificial Intelligence

Amodal segmentation and amodal content completion require using object priors to estimate occluded masks and features of objects in complex scenes. Until now, no data has provided an additional dimension for object context: the possibility of multiple cameras sharing a view of a scene. We introduce MOVi-MC-AC: Multiple Object Video with Multi-Cameras and Amodal Content, the largest amodal segmentation and first amodal content dataset to date. Cluttered scenes of generic household objects are simulated in multi-camera video. MOVi-MC-AC contributes to the growing literature of object detection, tracking, and segmentation by including two new contributions to the deep learning for computer vision world. Multiple Camera (MC) settings where objects can be identified and tracked between various unique camera perspectives are rare in both synthetic and real-world video. We introduce a new complexity to synthetic video by providing consistent object ids for detections and segmentations between both frames and multiple cameras each with unique features and motion patterns on a single scene. Amodal Content (AC) is a reconstructive task in which models predict the appearance of target objects through occlusions. In the amodal segmentation literature, some datasets have been released with amodal detection, tracking, and segmentation labels. While other methods rely on slow cut-and-paste schemes to generate amodal content pseudo-labels, they do not account for natural occlusions present in the modal masks. MOVi-MC-AC provides labels for ~5.8 million object instances, setting a new maximum in the amodal dataset literature, along with being the first to provide ground-truth amodal content. The full dataset is available at https://huggingface.co/datasets/Amar-S/MOVi-MC-AC ,


Computer says... FAULT! Wimbledon scraps line judges for first time in 148-year history as it replaces iconic umpires for AI-powered machines

Daily Mail - Science & tech

Wimbledon gets under way today with line judges scrapped for the first time in the tournament's 148-year history - replaced by AI-powered technology. Some of the sport's biggest stars have descended on south-west London for the showpiece two-week event at the All England Club - including defending singles champions Carlos Alcaraz and Barbora Krejčíková. Britain's hopes rest on Jack Draper, Katie Boulter, Cameron Norrie and Emma Raducanu, who will battle through back injury in an attempt to win her second career Grand Slam. And all eyes are on how this year's occasion copes with a shift in the way the game is umpired, as human line judges are replaced by artificial intelligence systems instead. The controversial decision has left fans torn, with some praising the forward-thinking idea while others disliking the idea of technology taking the place of a person.


From Hawk-Eye to AI-powered predictions on winners: The futuristic technologies powering Wimbledon 2025, revealed

Daily Mail - Science & tech

The moment tennis fans have been waiting for is nearly here – the start of Wimbledon 2025. From Monday, some of the biggest stars will battle for the most prestigious prize in tennis, including defending champions Carlos Alcaraz and Barbora Krejčíková. Britain's hopes rest on Jack Draper, Katie Boulter, Cameron Norrie and Emma Raducanu, who will battle through back injury in an attempt to win her second career Grand Slam. Novak Djokovic aims to win his eighth Wimbledon men's single's title, matching the record set by Roger Federer, but Australian fan favourite Nick Kyrgios will be absent. This year, Wimbledon will do away with human line judges for the first time in its 148-year history, to be replaced with AI.


Imilab EC6 Panorama review: Big coverage from a single camera

PCWorld

The Imilab EC6 Panorama delivers broad, high-quality coverage with smart features that punch above its price. It's a strong choice for anyone willing to trade a compact design for fewer blind spots and less reliance on multiple cameras. Covering large areas like front yards, driveways, or wide side yards often means juggling multiple cameras and feeds–and apps, if you buy from more than one manufacturer. But as users pile on more devices to cover every angle, managing all that footage can get cumbersome and expensive. Imilab's EC6 Panorama 3.5K Wi-Fi Spotlight Camera aims to solve that by offering a broad, near wraparound coverage from a single vantage point. It's built to reduce blind spots--and maybe the need for a second or even third camera.


Panoramic Direct LiDAR-assisted Visual Odometry

Yuan, Zikang, Xu, Tianle, Wang, Xiaoxiang, Geng, Jinni, Yang, Xin

arXiv.org Artificial Intelligence

Enhancing visual odometry by exploiting sparse depth measurements from LiDAR is a promising solution for improving tracking accuracy of an odometry. Most existing works utilize a monocular pinhole camera, yet could suffer from poor robustness due to less available information from limited field-of-view (FOV). This paper proposes a panoramic direct LiDAR-assisted visual odometry, which fully associates the 360-degree FOV LiDAR points with the 360-degree FOV panoramic image datas. 360-degree FOV panoramic images can provide more available information, which can compensate inaccurate pose estimation caused by insufficient texture or motion blur from a single view. In addition to constraints between a specific view at different times, constraints can also be built between different views at the same moment. Experimental results on public datasets demonstrate the benefit of large FOV of our panoramic direct LiDAR-assisted visual odometry to state-of-the-art approaches.


Multicam-SLAM: Non-overlapping Multi-camera SLAM for Indirect Visual Localization and Navigation

Li, Shenghao, Pang, Luchao, Hu, Xianglong

arXiv.org Artificial Intelligence

This paper presents a novel approach to visual simultaneous localization and mapping (SLAM) using multiple RGB-D cameras. The proposed method, Multicam-SLAM, significantly enhances the robustness and accuracy of SLAM systems by capturing more comprehensive spatial information from various perspectives. This method enables the accurate determination of pose relationships among multiple cameras without the need for overlapping fields of view. The proposed Muticam-SLAM includes a unique multi-camera model, a multi-keyframes structure, and several parallel SLAM threads. The multi-camera model allows for the integration of data from multiple cameras, while the multi-keyframes and parallel SLAM threads ensure efficient and accurate pose estimation and mapping. Extensive experiments in various environments demonstrate the superior accuracy and robustness of the proposed method compared to conventional single-camera SLAM systems. The results highlight the potential of the proposed Multicam-SLAM for more complex and challenging applications. Code is available at \url{https://github.com/AlterPang/Multi_ORB_SLAM}.


Multi-view Disentanglement for Reinforcement Learning with Multiple Cameras

Dunion, Mhairi, Albrecht, Stefano V.

arXiv.org Artificial Intelligence

The performance of image-based Reinforcement Learning (RL) agents can vary depending on the position of the camera used to capture the images. Training on multiple cameras simultaneously, including a first-person egocentric camera, can leverage information from different camera perspectives to improve the performance of RL. However, hardware constraints may limit the availability of multiple cameras in real-world deployment. Additionally, cameras may become damaged in the real-world preventing access to all cameras that were used during training. To overcome these hardware constraints, we propose Multi-View Disentanglement (MVD), which uses multiple cameras to learn a policy that is robust to a reduction in the number of cameras to generalise to any single camera from the training set. Our approach is a self-supervised auxiliary task for RL that learns a disentangled representation from multiple cameras, with a shared representation that is aligned across all cameras to allow generalisation to a single camera, and a private representation that is camera-specific. We show experimentally that an RL agent trained on a single third-person camera is unable to learn an optimal policy in many control tasks; but, our approach, benefiting from multiple cameras during training, is able to solve the task using only the same single third-person camera.


BundledSLAM: An Accurate Visual SLAM System Using Multiple Cameras

Song, Han, Liu, Cong, Dai, Huafeng

arXiv.org Artificial Intelligence

Multi-camera SLAM systems offer a plethora of advantages, primarily stemming from their capacity to amalgamate information from a broader field of view, thereby resulting in heightened robustness and improved localization accuracy. In this research, we present a significant extension and refinement of the state-of-the-art stereo SLAM system, known as ORB-SLAM2, with the objective of attaining even higher precision. To accomplish this objective, we commence by mapping measurements from all cameras onto a virtual camera termed BundledFrame. This virtual camera is meticulously engineered to seamlessly adapt to multi-camera configurations, facilitating the effective fusion of data captured from multiple cameras. Additionally, we harness extrinsic parameters in the bundle adjustment (BA) process to achieve precise trajectory estimation.Furthermore, we conduct an extensive analysis of the role of bundle adjustment (BA) in the context of multi-camera scenarios, delving into its impact on tracking, local mapping, and global optimization. Our experimental evaluation entails comprehensive comparisons between ground truth data and the state-of-the-art SLAM system. To rigorously assess the system's performance, we utilize the EuRoC datasets. The consistent results of our evaluations demonstrate the superior accuracy of our system in comparison to existing approaches.


Online Multi Camera-IMU Calibration

Hartzer, Jacob, Saripalli, Srikanth

arXiv.org Artificial Intelligence

Visual-inertial navigation systems are powerful in their ability to accurately estimate localization of mobile systems within complex environments that preclude the use of global navigation satellite systems. However, these navigation systems are reliant on accurate and up-to-date temporospatial calibrations of the sensors being used. As such, online estimators for these parameters are useful in resilient systems. This paper presents an extension to existing Kalman Filter based frameworks for estimating and calibrating the extrinsic parameters of multi-camera IMU systems. In addition to extending the filter framework to include multiple camera sensors, the measurement model was reformulated to make use of measurement data that is typically made available in fiducial detection software. A secondary filter layer was used to estimate time translation parameters without closed-loop feedback of sensor data. Experimental calibration results, including the use of cameras with non-overlapping fields of view, were used to validate the stability and accuracy of the filter formulation when compared to offline methods. Finally the generalized filter code has been open-sourced and is available online.